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INTRDDOCTIOn 


In  1968  a  study  was  performed  on  the  compilation  and  analysis  of 
lexical  resources  in  information  science.^  Primarily,  that  study  vas 
intended  to  locate,  to  collect  lexical  tools  and  to  c^ipare  their  content 
as  the  initial  pheise  of  a  research  program  on  the  language  of  information 
science.  While  some  comnonality  vaa  observed  from  an  aggregate  of  the 
terms  found  in  more  them  twenty  soiirces,  the  diversity  proved  to  be  even 
more  pronounced.  Ihis  was  not  unesgtected.  Authority  files  built  for 
a  particular  application  will  always  differ  since  they  are  necessarily 
constructed  to  reflect  the  interests  of  their  sponsoring  orgemization. 

But  a  discipline  can  be  presumed  to  have  a  body  of  knowledge  in  its  own 
right  and  a  set  of  concepts  emd  terminology  used  generally  in  its 
literature  and  discourse. 

This  project  has  used  a  selected  group  for  the  same  lexical  tools, 
all  in  English  but  in  scxne  instances  translations  from  original  lan¬ 
guages,  to  construct  a  language  of  the  discipline.  A  number  of  itera¬ 
tions  may  well  be  necessaury  before  the  product  is  able  to  stand  close 
scruciny.  Its  more  immediate  utility  will  be  derived  from  serving  am 
one  input  for  the  forthcoming  research  sponsored  by  OSTI  and  directed 
by  Coates  amd  Mills  who  will  be  experimenting  with  system-to-system 
conversion  through  the  Intermediate  Lexicon  of  the  Groupe  d* etudes  sur 
1 '  inf ormation  scientificrue. 

The  Intermediate  Lexicon  was  used  in  this  study  because  it  repre¬ 
sents  a  construct  of  information  science  in  idtich  a  nuaber  of  experienced 
information  specialists  have  participated.  It  divides  the  domain  into 
a  set  of  general  categories  and  groups,  each  with  a  precise,  detailed 
definition. 


^Co(q>ilation  and  Analysis  of  Lexical  Resources  in  information  Science. 
Patricia  0.  FueUhart  and  D.C.  Weeks.  Washington,  BSCP,  George 
Washington  Universj.ty  Coanunique  29-68,  15  June  1968.  Research 
sponsored  in  part  by  Air  Force  Office  of  scientific  Research,  Office 
of  Aerospace  Research  Contract  AF-APOSR- 13 25-67  and  in  part  by  the 
National  Library  of  Medicine  under  grant  5T01-LH-00101-02 . 
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BACKGROUND 


Compatibility  ctnd  Convertabilltv 

Ihe  multidisciplinary  nature  of  the  field  of  scientific  and  tech¬ 
nical  information  is  inplicit  in  its  genesis  and  continued  interchange. 
Fundamentally  this  discipline  depends  upon  a  nunber  of  others  such  as 
logic,  linguistics,  malbematics,  coinputer  technology  for  its  apparatus 
and  methods.  Ihrouc^  application  to  other  disciplines,  information 
science  has  contributed  t.o  the  organization  of  those  fields.  The  devel¬ 
opment  of  information  science  cind  its  applications  have  proceeded  in 
parallel,  with  the  emphasis  perhaps  leaning  to  application  carried  out 
with  trial-and-error  methods.  One  consequence  has  been  the  growth  of 
a  body  of  practitioners  who  were  scientists  first  and  information 
specialists  afterward.  This  process  is  not  reversible,  suggesting  that 
formal  trauLning  will  continue  to  follow  education  in  science  as  the 
general  pattern  of  entry  into  careers  in  information. 

Althou^  information  science  is  still  emerging  as  am  established 
discipline  as  the  branch  of  science  that  deals  with  technical  informa¬ 
tion  processing,  the  recognition  of  its  foundations  and  the  investiga¬ 
tions  of  its  various  aspects  have  produced  a  substantial  corpus  of 
literature.  As  am  object  of  study,  this  literature  could  be  expected 
to  yield  some  interesting  conclusions  about  the  nature  of  the  disci¬ 
pline  and  the  direction  in  which  it  may  appear  to  be  moving. 

Organizations  for  documentation  with  the  objective  of  collecting, 
organizing,  analyzing  and  disseminating  informdtion  related  to  the 
information  science/documentation  corpus  of  literature  have  noticeably 
Increased  over  the  past  two  decades.  Althou^  the  scope  of  their 
collections  and  the  activities  of  information  analysis  and  storage  may 
differ  according  to  the  requirements  of  the  user  cooMunities  served 
by  these  organizations,  each  documentation  center  is  basically 
cxmoemed  with  the  same  cosprehensive  corpus  of  literature.  Ideally, 
each  «rauld  profit  from  access  to  the  document  systems  maintained  by 
other  organizations.  The  degree  to  which  this  access  presently  exists 
is  a  measure  of  duplication  rather  than  system-to-system  communication. 

A  major  impediment  to  access  by  exchange  rests  with  the  indexing 
tools  and  classification  schemes,  i.e.  the  indexing  language  used  by 
ead)  system.  The  nature  of  the  lexical  resource  -  classification 
schemes,  thesauri,  descriptor  lists  and  documentary  lexicons  (such  as 
dictionaries  and  glossaries)  and  its  conceptual  construct  and  represen¬ 
tation  differ  widely  from  organization  to  organization.  However,  these 
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formal  training  will  continue  to  follow  education  in  science  as  the 
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Although  information  science  is  still  emerging  as  an  established 
discipline  as  the  branch  of  science  that  deals  with  technical  informa¬ 
tion  processing#  the  recognition  of  its  foundations  and  the  investiga¬ 
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collections  and  the  activities  of  information  analysis  and  storage  may 
differ  according  to  the  requirements  of  the  user  coasninities  served 
by  these  organizations,  each  documentation  center  is  basically 
concerned  with  the  same  comprehensive  corpus  of  literature.  Ideally# 
each  would  profit  from  access  to  the  document  systeM  maintained  by 
other  organizations.  The  degree  to  which  this  access  presently  exists 
is  a  swasure  of  duplication  rather  than  system-to-system  cosnunication. 

A  major  ispediment  to  access  by  exchange  rests  with  the  indexing 
tools  and  classification  schemes,  i.e.  the  indexing  language  used  by 
each  system.  The  nature  of  the  lexical  resource  -  classification 
scherr«s#  thesauri,  descriptor  lists  and  documentary  lexicons  (such  as 
dictionaries  and  glossaries)  and  its  conceptual  construct  and  represen¬ 
tation  differ  widely  from  organization  to  organization.  Itowever,  these 
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ordering  systems  do  possess  common  sementic  attributes,  as  they  must 
since  they  represent  concepts  which  appear  in  the  literature  of  informa¬ 
tion  science.  These  concepts  are  representative  not  only  of  the  disci¬ 
pline  of  information  science  but  also  of  the  applications  of  such 
concepts  to  the  processing  of  scientific  <uid  technical  information,  it 
is  this  commonality  on  tdiich  expectations  of  convertibility  between 
document  systems  via  respective  indexing  languages  are  based.  Converti¬ 
bility  at  the  input-processing  stage  is  a  prerequisite  to  providing 
organizations  devoted  to  information  science  documentation  with  efficient 
euid  effective  access  to  the  relevant  document  store  of  other,  similar 
ozgauiizations . 

Convertibility  has  often  been  interpreted  to  mean  standudization  - 
a  not  altogether  welcome  or  desirable  end  since  it  iBq>lies  a  rejection 
of  local  sla' t  or  bias  in  favor  of  uniformity.  A  acre  appropriate 
objective  is  coiomunication  between  systems  without  forcing  the  peurtici- 
pants  to  first  become  identical  as  the  cost  of  admission. 

If  systems  are  to  communicate  in  a  language  that  is  not  altogether 
their  own,  then  clearly  some  common  language  must  be  available.  Given 
the  need  to  formalize  and  control  the  indexing-retrieval  lemguage  of  an 
information  systoD,  such  languages  are  already  meta-languages,  placing 
certain  restraints  on  the  vocabulary  and  syntax  permitted  for  the 
description  of  docuMntary  information  and  for  fomiluting  queries. 

Still  another  structured  langtiage  or  meta  language  to  serve  as  a  system 
to  system  communication  Iwguage  suggests  even  greater  restraints.  This 
need  not  be  true.  But  until  such  a  resource  is  developed,  little  is 
)cnown  about  the  total  vocabulary  of  information  science  or  the  degree 
to  tdtich  nioMtrous  sub-sets  are  «\c^q;>assed  within  it. 

The  research  of  this  project  has  been  aimed  at  developing  an 
initial,  cosDon  vocabulary'  which  will  contribute  to  devising  awthods  of 
coanunication.  The  language  that  describes  cooninication  in  a  discipline 
can  never  be  as  flexible  nor  varied  as  that  of  speech  or  writing.  It 
must  be  restrictive  and  devoid  of  variety  so  that  things  and  ideas  that 
are  sitailar  can  be  described  in  similar  terms.  The  SMta  languages 
analyzed  here  are  dissimilar  in  many  respects  -  none  of  them  fundamental, 
but  largely  in  detail.  These  lexical  sources  were  not  conceived  es 
definitive  lexicons  of  inforsmtion  science;  moet  reflect  their  indivi- 
duel  purpose  and  very  likely  suit  those  pruposes  very  well . 

Nhether  any  lexicon  can  claim  to  be  normetive  im  problematic.  Only 
within  an  epplicetion  can  one  set  of  decisions  be  mede  presecriptive. 

But  if  several  applications  are  joined,  ha«rev«r  loosely,  some  coemon 
means  of  description  will  be  essentiel  for  communication.  This  kind  of 
uniformity  is  promoted  by  efforts  to  achieve  e  set  of  descriptors  that 
can  be  used  as  a  tool  at  more  than  one  installatioto.  In  producing  such 
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a  lexicon,  local  slant  and  viewpoints  are  lost  but  oosnonalitles  become 
more  evident  as  a  somewhat  ’sterilized'  language  emerges. 

If  the  processes  of  analysis  and  sesumtic  reduction  necessarily 
remove  idiosyncracies  from  a  number  of  sources,  these  same  steps  can  be 
said  to  yield  a  core  lainguage  of  a  discipline.  In  the  process,  individ¬ 
ual  descriptors  tend  to  acqiiire  smaller,  more  general  intension  and 
thus  increase  their  extension.  One  kind  of  utility  that  may  be  inherent 
in  an  intermediate  lexicon  should  be  demonstrated  by  Coates  and  Mills 
in  their  index  conversion  experiment,  using  em  intermediate  leunguage  to 
transfer  records  from  one  system  to  anotiier.  Other  uses  can  be  found 
in  examining  tlie  common  language  as  a  resource  for  discovering  charac¬ 
teristics  of  the  discipline  itself. 

Information  science  has  been  more  successful  in  establishing 
methods  of  information  heuidling  both  for  mission  oriented  agencies  and 
for  disciplines,  pszticularly  in  physical  science,  than  it  has  in 
defining  and  structuring  itself.  One  cleaur  indicator  ol  a  mature  dis¬ 
cipline  is  the  \iniform  nature  of  formal  education  for  entry  into  a 
profession.  Information  science  still  lacks  such  unifoonnity.  One 
viewpoint  sees  the  discipline  as  'documentation*  -  the  rneems  of  handling 
information;  another  regcurds  the  discipline  cis  one  devoted  to  the  study 
of  information,  its  characteristics  and  behavior.  It  a  degree,  the 
two  viev^oints  are  both  reflected  in  the  definition  framed  for  the  initial 
study  emd  used  in  this  study  as  a  meeisure  for  the  assessment  of  terms: 
"the  study  and  development  of  conceptual,  methodological,  and  techno¬ 
logical  foundations  for  the  control  and  distribution  o£  substcuitive 
information,  these  fou'^dations  apply  equally  to  the  collection,  storage, 
manipulation  and  retrieval  of  information  and  to  the  diauracteristics 
of  information  itself."^ 

Despite  the  differences  in  stating  what  information  science  is, 
there  is  a  quite  general  similarity  in  the  things  it  does  when  brought 
to  bear  on  the  development  of  information  systems.  Ihere,  apart  from 
a  few  experimental  programs,  similarities  have  increased  in  recent 
years.  Special  purpose  equipment  which  was  developed  for  some  earlier 
applications  seems  to  have  Icurgely  disappeared.  General  purpose  equip¬ 
ment,  thesauri  as  authority  files  and  coordinate  indexing  techniques 
have  become  a  common  ground  for  scientific  and  technical  information 
systems  design. 

% 

If  practitioners  are  to  develop  an  intellectual  basis  and  a  body 
of  knowledge,  a  mature  language  of  the  discipline  is  an  essential  tool 
of  communication.  Cobuminication  between  systems  may  be  the  easier  place 


^Op.  Cit. 


Fuellhart  and  Weeks,  p.  3. 
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to  begin.  Ihings  and  processes  can  be  more  tractable  than  people; 
indexing  languages,  however  imperfect,  are  free  from  the  qualifiers 
that  make  written  and  oral  communication  more  stimulating,  but  often 
deliberately  vague  and  imprecise. 


Commonality 


In  order  to  effect  convertibility  between  many  document  systems 
which  have  commonality  of  concepts  cert^d.n  fundamental  problems  must 
be  addressed.  Wall  and  Barnes^  have  indicated  that  the  subject  content, 
or  the  scope  of  coverage  of  the  system,  creates  a  viewpoint  which 
affects  the  meaning  of  terms  in  an  indexing  vocabulary-  They  have 
identified  the  fundamental  problems  which  arise  in  effecting  convert- 
ibil:  ':y  betwe  .1  lexiced.  resources  as  generic  problems  and  semantic 
problems. 


Ideally,  the  work  of  combining  a  number  of  indeximg  tools  into  a 
single  lexicon  should  be  performed  by  the  authors  of  those  same 
indexing  languages.  When  performed,  as  in  this  instance,  by  researchers 
the  eussignment  of  terms  to  one  lexicon  group  or  another,  the  identifi- 
cation  of  synonyms,  the  retention  or  rejection  of  certain  descriptors 
is  largely  arfiitrzury.  Since  the  source  materials  were  removed  from 
their  local  context,  the  original  viefwpoints  were  unknown.  To  a  degree, 
this  simplifies  the  compilers'  tcLsk,  because  varied  interpretations 
eire  lost  and  terms  which  appear  to  be  identical  are  treated  as  though 
they  were.  Only  when  hierarchiccil  structure  or  explicit  definition 
indicates  ein  interpretation  is  it  possible  to  distinguish  between 
different  meanings  attached  to  terms. 

Generic  Problems 


As  Wall  and  Barnes  indicated,  the  meaning  of  a  texm  is  dictated  by 
the  class  of  category  to  which  it  belongs  in  the  lexical  resource. 

Thus,  problems  in  convertibility  occur  when  classificaltion  schemes  or 
other  structured  lexical  resources  are  so  constructed  that  they  vary 
markedly  in  the  specific  levels  of  terms  used.  The  levels  of  specificity 
are  predicated  on  the  requirements  of  the  user  community  and  can  be 
evaluated  or  altered  only  within  the  framework  of  that  environment. 

There  is  one  advantage  available  to  the  compiler  of  a  conversion 
instrument.  If  that  instrument  were  already  in  being,  a  number  of 
unyielding  conflicts  would  be  certain  to  arise.  His  cJ.ternatives  are 
then  to  force  agreement  by  ignoring  different  generic  levels  or  to 
leave  some  concepts  unaccounted  for  amvong  the  equivalemt  terms.  But 
if  the  conversion  language  is  in  process  of  being  assembled,  then  the 
way  remains  open  to  find  a  solution  that  does  not  force  equivalence  where 
none  is  fitting  or  leave  unsolved  the  more  difficult  problems. 


2e.  Wall,  e^.  a]^. ,  "Intersystem  Compatibility  and  Convertibility  of 
Subject  Vocabularies,"  Auerbach  Corporation,  Philadelphia,  Penn., 

8  May  3969 
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SeiDcmtic  Problems 


Semantic  problems  encountered  in  effecting  convertibility  between 
lexical  resources  have  been  identified  by  Wall  and  Bcixnes  as  primarily 
problems  in  synonyms  and  homographs.  These  basic  semantic  problems 
are  also  encountered  in  document  systems  having  basically  the  scune 
scope  of  coverage  and  presenting  the  same  basic  concepts.  Again,  the 
concept  term  chosen  from  an  eurray  of  synonyms  cmd  the  meaning  of  the 
homograph  is  determined  by  the  user's  requirements  as  shaped  by  this 
environment. 

In  treating  the  semantic  problems  encountered  in  developing  the 
capability  for  conversion  between  vocabularies,  concomitant  generic 
problems  arise.  Synonyms  for  basic  concepts  may  vary  the  level  of 
specificity  or  detail.  A  lexical  tool  for  effecting  convertibility 
between  vocabulciries  must  allow  for  the  postable  terms  of  one  relevant 
vocabulary  to  be  equated  with  the  acceptable  synonyms  or  postulate 
terms  of  other  vocabularies. 

Ihe  ideal  method  for  effecting  convertibility  between  the  lexical 
resources  of  numerous  organizations  concerned  with  the  field  of 
information  science  documentation  and  with  the  processing  of  scientific 
and  technical  information  is  to  develop  a  lexical  resource  vdiich  serves 
as  the  object  vocabulcury  through  which  terms  from  one  vocabulary  with 
simileir  scope  can  be  converted  in  terms  of  another  vocabulary.  This 
method  is  preferable  to  selecting  one  lexical  resource  from  many  as 
the  object  vocabulary.  To  use  a  vocabulary  developed  for  a  particuleu: 
environment  would  arbitrarily  establish  the  viewpoint  and  conventions 
of  that  vocabuleuy  as  the  criteria  of  correctness.  Vihat  is  required  is 
the  development  of  a  lexical  resource  which  encompasses  the  universe 
of  viewpoints  from  a  general  level  to  a  specific  level  and  allows  for 
convertibility  between  any  term  accepted  in  one  vocabulary  to  any 
comparable  term  in  another  vocabulary.  Only  if  this  requirement  can  be 
satisfied,  can  the  exchange  of  resources  and  documents  between  systems 
be  realistically  €Uid  economically  realized. 

A  CONSTRUCT  OF  INFORMATION  SCIENCE 


A  substantial  beginning  in  the  development  of  such  a  resource  has 
arisen  from  a  study  on  compatability  emd  convertibility  of  indexing 
tools  for  literature  concerned  with  scientific  information  by  the 
Groupe  d'Etude  sur  1 ' Inf ormation  Scientifique.  The  object  of  this 
study  was  to  delineate  the  basic  concepts  which  would  form  the  frame¬ 
work  for  the  organization  of  terms  and  specific  concepts  relative 
to  scientific  information  processing.  The  delineation  of  these 
concepts  and  the  development  of  this  framework  was  beused  on  the 
comparison  of  extant  lists  of  descriptors  intended  to  be  used  for 


^An  Outline  Intermediate  Lexicon  to  Assist  Interconversion  Between  Terms 
Used  in  Various  Incloiting  Languages  in  the  Field  of  Scientific  and  Technical 
Information  I'rocoss intr.  Compiled  by  an  International  Working  Party  convened 
by  the  Ciroupo  D'Dtude  Sur  L' Information  Scientifique  (Marseille  and  Paris) 
January  19GH. 
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processing  scientific  and  technical  information.  Systematic  reviews  by 
a  group  of  everts  and  subsequent  revisions  based  on  their  recommenda¬ 
tions  and  on  ccxnparisons  of  the  preliminary  lexicon  to  additional  lists 
of  descriptors  resulted  in  an  "Intermediate  Lexicon".  An  English  ' 
translation  of  this  "Intermediate  Lexicon"  developed  by  the  interna¬ 
tional  Working  Party  convened  by  the  Groupe  d'Etude  sur  1' Information 
Scientifiquk^  has  been  prepared  by  E.J.  Coates  and  D.C.  Weeks.  ^  This 
work  sets  forth  in  detail «  the  history,  objectives  and  methodology,  as 
well  as  a  detailed  presentation  of  the  substance  of  the  Intermediate 
Lexicon.  The  following  sections  are  a  summarjr  of  this  document. 

Ihis  Intermediate  Lexicon  is  divided  into  six  basic  categories  which 
pertcdn  to  all  documentation  languages.  These  categories  are: 

A.  The  SUBJECT  of  the  document. 

B.  The  FIELD  of  science  or  technology  of  concern. 

C.  The  LANGUAGE  in  which  the  scientific  information  is 
conveyed. 

D.  The  COUNTRSf  or  region  with  which  the  study  is  concerned. 

E.  The  PEKIOD  concerned. 

F.  The  FOR!  of  the  document  being  indexed. 

The  category  of  subject  is  of  primary  inportance,  as  it  is  the 
category  which  enconpasses  the  universe  of  concepts  which  comprise  the 
conceptual,  methodological,  and  technological  foundations  of  the  process¬ 
ing  of  scientific  and  technical  information.  In  the  Intermediate  Lexicon 
these  concepts  or  ideas  are  arreuiged  in  25  main  groups  mhich  were 
observed  in  the  descriptor  lists  and  classification  schemes  concerned 
with  scientific  and  technical  information  processing. 

The  English  language  version  by  Coates  and  Weeks  indicates  the 
rationale  auid  the  method  whereby  these  25  main  groups  were  ujtermined. 
These  groups  were  constructed  from  genered.  categories  common  to  tlie 
various  lexical  resources  reviewed.  Compatibility  was  caetermined  solely 
on  the  basis  of  semantic  correspondences  between  groups  of  descriptors 
as  they  occurred  in  each  list  rather  than  as  determined  by  the  place 
of  each  category  in  the  structure  or  classification  framework  of  each 
list.  In  dealing  with  conposite  descriptors  as  they  occurred  in  each 
list,  convertibility  was  effected  only  at  the  explicit  semantic  level. 
Correspondence  of  combined  descriptors  as  they  occurred  in  each  list  was 
sought  within  the  appropriate  grovps,  depending  on  the  e^^licit 
principal  components  of  the  descriptor  in  question. 


^E.J.  Coates  and  D.C.  Weeks,  An  Outline  Intermediate  Lexicon  to  Assist 
Interconversion  Between  Terms  Used  in  Various  Indexing  ILanguages  in  the 
Field  of  Technical  Information  Processing.  English  Translation. 


The  utudy  did  not  attempt  to  determine  correspondences  for  secondary 
specifications  of  combined  descriptors.  Rather,  correspondence  was 
constrained  to  appropriate  assignments  of  the  principal  conceptual 
components  to  the  appropriate  general  categories.  Although  this 
method  results  in  considerable  semantic  reduction,  it  is  appropriate 
for  the  purpose  of  the  study,  i.e.  to  develop  an  Inrenaediate  Lexicon 
capable  of  effecting  compatability  between  multiple  lexical  resources 
which  may  be  expressed  in  nvuneric  languages. 

This  method  is  appropriate  for  an  Intermediate  Lexicon  responsive 
to  the  varying  viewpoints  and  multiple  lemguage  constructs  in  that: 

1.  It  does  not  require  the  enumeration  of  all  semantic 
factors  or  concepts  which  would  occur  from  all  view¬ 
points  or  #>11  languages,  but  instead  provides  conceptual 
units  useful  for  document  analysis  of  the  field  in 
question  regeirdless  of  viewpoint  or  language  constraint; 

2.  It  allows  for  the  precise  definition  of  each  semantic 
unit  of  the  Lexicon  which  in  turn  provides  for  con¬ 
vertibility  between  numerous  lexical  resources  with 
varying  levels  of  compound  descriptors  by  providing 

a  framework  for  determinging  correspondences  through 
the  elimination  of  ambiguities;  ani 

3.  By  precisely  defining  semantic  iinits  at  broad  semantic 
levels  each  of  which  has  nonetheless  an  independent 
meaning  in  the  field  of  aj'plication.  The  Lexicon 
avoids  use  of  overly  broad  terms  whicii  occur  in  natural 
language  and  which  are  actually  cf'mprised  of  several 
true  concepts  or  descriptors. 

The  Intermediate  Lexicon  achieves  its  piurpose  in  that  it  serves 
as  a  means  for  establishing  correspondences  between  lexicons.  The 
six  principal  facets  provide  for  those  aspects  which  are  of  concern 
in  perfoming  documentary  analysis. 

As  indicated  (previously)  the  Subject  facet  and  its  25  main  groups 
are  of  prime  importance  to  the  Lexicon  as  this  facet  represents  at  a 
carefully  defined  broad  level  the  universe  of  concepts  which  apply  to 
the  field  of  information  science  and  its  application  to  the  processing 
of  scientific  and  technical  information.  The  following  is  the  outline 
of  these  25  main  groups: 
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CATEGORY  "A" 
SUBJECTS 


List  of  "Groups"  and  Annexes 


The  headings  are  given  successively  in  French  and  in  English 
separated  by  the  sign  the  underlined  portion  is  that  which 
might  well  be  used  as  aui  abbreviated  heading. 


GROUP 

1.  L ' INFORMATION  SCIENTIFIQUE  fi  TECHNIQUE  (I.S.T.):  GENERALITES 
/SCIENTIFIC  AND  TECHNICAL  INFORMATION  IN  SOCIAL  CONTEXT 


2.  PROFESSION 
/PERSONNEL 


3.  LA  SCIENCE  DE  L'l.S.T. 

/SCIENTIFIC  &  TECHNICAL  INFORMATION  PROCESSING  AS  A  SCIENCE 


4. 


5. 


SCIENCES  &  TECHNIQUES  CONNEXES 
/FRINGE  TECHNIQUES  AND  SCIENCES 

ORGANISMS  D'l.S.T. 

/ORGANIZATIONS:  INTERNAL  ORGANIZATION  OF  INDIVIDUAL  UNITS 


6.  ORGANISATION  DE  L'l.S.T.:  STRUCTURE 

/ORGANIZATIONS  AND  ORGANIZED  NETWORKS:  STRUCTURE 


7.  FONTIONNEMENT  DE  L'l.S.T. 

/ORGANIZATIONS  AND  ORGANIZED  NETWORKS:  FUNCTIONING 


8.  TYPES  DE  DOCUMENTS 

/DOCUMENTARY  MATERIALS 


9.  REPRODUCTION 
/REPROGRAPHY 

10.  TRAITEMENT  DE  L'l.S.T.;  GENERALITES 

/SCIENTIFIC  &  TECHNICAL  INFORMATION  PROCESSING:  GENERAL 

11.  COLLECTE  DES  INFORMATIONS 
/COLLECTION  OF  INFORMATION 
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GROUP 


12.  ANALYSE  DOCUMENTAIRE ;  GENERALITES 
/DOCUMENTARY  ANALYSIS  FOR  RETRIEVAL:  GENERAL 

13.  CATALOGAGE 
/CATALOGING 

14.  CONDENSATION;  RESUMES 

/ABSTRACTING  AND  ANNOTATING 

15.  SYNTHESE 

/SURVEYS  OF  DOCUMENT  CONTENT 

16.  CLASSIFICATION  &  INDEXATION 
/CLASSIFICATION  &  INDEXING 

17.  EXTRACTION,  TABULATION:  INDEX 

/INDEX  MAKING  BY  EXTRACTING  TERMS  FROM  INPUT 

18.  TRADUCTION 
/TRANSLATION 

19.  CODIFICATION 
/CODIFICATION 

20.  ENRIGISTREMENT  &  STOCKAGE 

/INPUT  &  STORAGE 

21.  EXPLOITATION  DOCUMENTAIRE 
/FILE  EXPLOITATION 

22.  DIFFUSION 
/DISSEMINATION 

23.  MATERIEL 


24.  TERMINOLOGIE  S  LEXICOGRAPHIE 

AERMINOLOGY  &  LEXICOGRAPHY 

25  NORMALISATION 

/STANDARDIZATION 
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ANNEX 


I.  PERSCXfNES 
/PERSONS 

II.  ORGANISMES 
/ORGANIZATIONS 

III.  SYSTEMES 
/SYSTEMS 

IV.  EQUIPMENTS 
/Equipment 

V.  PUBL’"?ATIONS 
/PUBLICATIONS 


The  Intermediate  Lexicon  is  in  a  preliminaury  state  of  development. 
In  accordemce  with  its  present  purpose  it  does  not  enumerate  detailed 
specific  descriptors  tdiich  comprise  the  subsets  of  each  of  the  25 
primary  concepts.  To  do  so  would  restrict  the  caped>ility  to  establish 
correspondence  between  lexicons  using  varying  forms  of  related  con** 
cepts,  e.g.  single  term  or  ccmbined  descriptors,  by  in^sing  arbitreury 
conventions  which  would  be  Incompatible  with  the  unique  requirements 
of  organizations  developing  the  list. 

A  long  range  objective  of  the  Working  Peurty  was  to  expand  the 
Intermediate  Lexicon  by  establishing  numerous  equivalents  between 
elements  in  the  individual  descriptors.  The  proposed  means  for  est^d>- 
llshlng  these  equivalences  is  a  concordance  table  where  descriptors 
in  each  of  the  lists  Ll,  L2...L:i  are  related  to  each  other  and  to  the 
Intermediate  Lexicon  Lo.  The  Intermediate  Lexicon,  Lo,  serves  as  the 
Intermediary  language  for  translating  Ll  to  Lj  and  which  is  itself 
inferred  from  the  existing  list  Ll...Ln  (1)"  [4].  The  following 
table  and  explanation  is  extracted  from  the  English  version  and 
represents  the  basic  algorithm  by  tdiich  the  Intermediate  Lexicon  can 
be  extended. 


Lo 

Ll 

U 

Ln 

t® 

.  t} 

■ 

ft 

• 

t" 

t° 

^2 

C  tj  t’ 

0 

t® 

n 

— 

— 
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pimposE 


The  purpose  for  which  this  study  was  made  was  twofold: 


1.  To  demonstrate  how  the  Intermediate  Lexicon  may  be  extended 
by  extracting  descriptors  from  a  compilation  of  English 
language  (including  translations  to  English)  lexical  resources 
which  function  as  operational  indexing  tools  by 

a.  relating  these  descriptors  to  the  Intermediate  Lexicon, 
emd 

b.  relating  them  to  each  other  by  utilizing  the  concordemce 
table  construct; 


2.  To  examine  the  results  to  demonstrate: 

a.  the  autonomy  and  universality  of  tbe  subject  categories 
in  the  Intermediate  Lexicon;  zuid 

b.  the  capability  for  (empirically]  developing  the  explicit 
hierarchy  vdiich  is  presently  implicit  in  each  subject 
category  of  the  Intermediate  Lexicon  by  virtue  of  the 
precise  definitions  developed  for  each  category. 


Extension  of  the  Intermediate  Lexicon.  As  has  been  previously  stated, 
there  is  no  intent  to  develop  the  subject  categories  of  the  Intermediate 
Lexicon  to  include  a  unique  list  of  descriptors.  However,  in  function¬ 
ing  as  a  vehicle  for  effecting  correspondences  between  indexing  languages, 
it  must  ultimately  be  responsive  to  concepts  as  represented  by  descriptors 
selected  for  c^rational  indexing  tools  which  are  expressed  in  the  same 
language,  English,  to  each  other  aind  to  the  subject  categories  of  the 
Intermediate  Lexicon  it  was  expected  that  a  frafflewc}r)c  of  specific 
descriptors  could  be  constructed.  This  frunework  would  provide  a 
corpus  of  descriptors  which  would  serve  as  a  partial  base  from  which 
the  primary  concepts  delineated  by  the  subject  categories  stay  be 
extended  to  define  the  sudbsets  of  concepts  which  are  implicit  or  e;q>licit 
in  the  primary  concepts.  The  definition  of  subsets  of  concepts  would 
assist  in  extending  the  implicit  hierarchy  inherent  in  the  Intermedicuo 
Lexicon  and  would,  as  a  result,  increa.oe  the  efficiency  of  the  Lexicon 
in  effecting  comps tability  between  indexing  systems. 
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Autonomy  and  universality  of  the  subject  categories.  The  development 
of  primary  subject  categories  tdiich  represent  broad  concepts  with 
precise,  independent  meamings  in  each  field  of  application  produces 
semantic  automomy.  Senumtic  autonomy  is^lies  that  the  meauiings  of 
concepts  are  stable  find  semantic  problems  such  as  synonomy  and  homo-* 
graphs  can  be  more  readily  resolved  in  effecting  coorrespondences 
between  descriptor  lists.  The  universality  of  the  subject  categories 
is  a  quality  that  the  Intermediate  Lexicon  possesses  in  that  the 
categories  represent  "the  totality  of  ideas  tdiich  ■£Jce  up  the  concep¬ 
tual  tools  of  scientific  and  technical  information."  With  the 
assumption  that  the  Intermediate  Lexicon  possesses  these  qualities, 
of  autonomy  and  universality  it  was  expected  that  an  analysis  of  the 
relationships  among  descriptors  extracted  from  the  various  indexing 
resoucres  a^d  the  relationships  of  these  resources  to  the  Lexicon 
would  demonstr^te  the  presence  of  these  qualities. 


En^irical  development  of  the  hierarchy  implicit  in  the  subject  category 
groups .  By  virtue  of  the  precise  definitions  presented  for  each  subject 
category  group,  there  is  an  iit5>licit  hierarchy.  The  prime  subject 
categories  of  the  Intermediate  Lexicon  should  determine  which  descriptors 
of  any  level  of  semantic  detcuLl,  are  to  subsumed  under  that  category. 

Thus  both  broader  conceptual  subsets  of  the  category  and  specific  de¬ 
tailed  concepts  would  be  arrayed  within  the  same  freunework.  By  relating 
these  two  levels  of  terms  within  the  concordance  table ,  it  was  expected 
that  descriptors  representing  broad  conceptual  subsets  would  be  related 
to  specific  detailed  descriptors. 


METHODOLOGY 


The  methodology  described  in  this  section  is  relater'  ro  the 
purpose  of  this  study;  to  assisting  in  extending  the  Intennediate 
Lexicon;  and  to  einalyze  the  data  in  order  to  verify  both  >be  autonomy 
emd  vmiversality  of  the  subject  categories  of  the  Lexicon  and  the 
capability  for  empirically  developing  an  explicit  hiereurchy  within  the 
framework  of  each  subject  category  guoup. 


Selection  of  lexical  resources.  AuUiority  files  utilized  -n  the  process- 
inq  of  information  science/documentation  literature  were  selected  from 
the  ccxnpilation  of  lexical  resources  cited  in  the  introduction.  This 
constraint  was  imposed  on  the  selection  of  lexical  resources  to  be 
included  in  the  study  with  the  purpose  of  replicating  the  problems  of 
bias  and  viewpoint  likely  to  be  encountered  in  effecting  system-to- 
system  convertibility.  According  to  ttis  rationale,  a  total  of 
fourteen  authority  files  were  selected  for  inclusion  in  this  study. 
Although  these  authority  files  ^re  identified  by  a  description  list 
code  number  in  the  various  taoxes,  the  authors  have  purposefully 
refrained  from  identifying  the  source  for  each  list  in  order  to  preclude 
eveLLviation  of  emy  given  list  without  regard  to  its  individual  purpose- 


Extending  the  Intermt;diate  Lexicon.  In  order  to  provide  a  partial  basis 
from  vrtiich  the  Interm*^iiate  Lexicon  may  be  extended ,  it  was  necessary 
to  perform  three  basic  tasks: 

•  “  Select  descriptors  from  operational  indexing  languages  which 

are  availeble  in  the  English  language 

•  Organize  these  descriptors  within  the  appropriate  subject 
category  groups  of  the  Intermediate  Lexicon 

•  Relate  the  selected  descriptors  to  each  other  and  to  the 
Intermediate  lexicon  in  concordance  table. 


Selection  of  descriptors.  Since  iniaxing. languages  use  a  multiplicity 
of  key  terms  and  phrases  to  represent  concepts ,  it  was  decided  to  use 
these  terras  as  they  occur  in  the  descriptor  lists  rather  than  to  identify 
concepts  by  initially  grouping  synonyms.  The  purpose  of  the  Lexicon  is 
to  provide  a  vehicle  for  effecting  correspondences  between  descriptor 
lists  without  placing  external  constraints  on  any  list.  In  order  to 
achieve  this  purpose  the  Lexicon  must  be  responsive  to  the  descriptor 
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lists  in  their  natural  form.  Thus,  there  was  miniiDal  intervention  in 
selecting  word  forms.  •'Tie  major  form  of  intervention  was  to  factor 
redundant  multiple  subject  headings  into  semantic  units  when  such 
factoring  did  not  cause  the  loss  of  a  con^unded  concept.  In  a  limited 
number  of  descriptors,  it  was  eurbitrarily  determined  that  the  compounded 
concept  could  easily  be  represented  by  post-coordination  and  that  the 
purpose  of  the  study  was  better  served  by  factoring  into  key  elements. 

The  limitation  in^osed  by  the  procedure  is  that  resultant  data  can 
only  serve  as  a  partial  base  for  extending  the  Lexicon  in  that  not  all 
descriptors  are  retained  in  their  natural  context.  It  was  felt  that 
this  limitation  was  feu:  outweighed  by  the  capability  to  compare  ntmierous 
key  terms  in  relation  to  each  other  so  that  those  key  terms  which 
occurred  frequently  might  be  considered  as  potential  concepts  designa¬ 
tions  in  a  later  revision  of  the  Lexicon.  This  limitation  is  further 
offset  by  the  analysis  of  the  hierarchy  implicit  in  the-  relationships 
demonstrated  in  the  concordcince  tables. 


OrgeUiization  of  terms  within  stibject  category  groups.  The  descriptors 
selected  from  the  indexing  language  tools  aere  initially  related  to 
the  appropriate  subject  category  groups.  The  appropriateness  is 
determined  by  1)  viewpoint  or  definition  when  explicitly  presented  in 
the  lexical  resource  and  2)  the  relationship  of  that  viewpoint  or  term 
the  definition  of  the  sunject  category  group. 

Construction  of  a  concordance  table.  A  concordance  table  was  constructed 
for  one  subject  category  group  in  accord  with  the  model  developed  in  -the 
lexicon  outline  as  presented  in  Section  III.  By  means  of  this  table  the 
selected  descriptors  were  related  to  each  other  and  to  the  intermediate 
Lexicon . 


ANALYSIS 


Tlie  fourteen  authority  files  reviewed  for  this  research  yielded  a 
total  of  5852  descriptors  according  to  the  previously  stated  methodology 
for  selection.  By  treating  those  descriptors  which  occurred  in  more 
than  one  list  as  a  single  descriptor  or  unit,  3002  unique  or  discrete 
descriptors  were  identified. 

Table  1  indicates  those  terms  which  were  essentially  eliminated 
from  further  investigation.  Terms  that  were  accommodated  elsewhere  in 
the  Intermediate  Lexicon  were  appropriately  accumulated  under  headings 
provided  in  the  Lexicon  outline.  Hiose  additional  headings  are:  periph¬ 
eral  disciplines  —  Category  B;  languages  —  Category  C;  localities  — 
Category  D;  and  chronological  frames  or  time  periods  —  Category  E.  The 
Annexes  accommodate  individual  names,  proper  names  emd  titles:  Annex  I  -- 
Persons;  Annex  II  -  Organizations;  Annex  III  -  Systems;  Annex  IV  -  Organi¬ 
zations;  and  Annex  V  -  Publications.  A  small  number  of  terms,  approxi¬ 
mately  10%  of  the  total,  were  rejected  on  the  basis  that  they  would  not 
function  as  index  terms  in  the  exchange  of  information  between  information 
science/docimentation  oriented  organizations.  These  terms  were  of  two 
types: 

•  Terms  that  cannot  stand  alone  but  must  be  used  as 
part  of  a  compound  concept. 

Activity  Acceptable  Automatic 

Appraisal  Informal  Implicit 

•  Terms  that  are  acceptable  only  in  the  local  contex’ 
of  the  vocabulary  in  which  they  are  used . 

Absence  Agreement 

Allusions 

The  rationale  for  excluding  such  terms  is  that  it  was  not  the  objective 
of  this  research  to  transfer  every  term  from  all  lists  into  list  L^  (the 
Intermediate  Lexicon)  so  that  L,  would  become  an  aggregate  of  all  source 
lists.  The  result  of  such  an  exercise  would  clearly  be  no  more  than  a 
random  collection  of  terms  and,  rather  than  extend  the  Lexicon,  would 
instead,  destroy  its  validity. 

Extension  of  the  Intermediate  Lexicon 


Of  3002  discrete  terms  reviewed,  2344  terms  remained  after  excluding 
terms  of  the  type  described  above.  These  terms  have  been  assigned  to  the 
25  groups  of  the  Intermediate  Lexicon  on  the  basis  of  the  e;q)licit  defini¬ 
tions  provided  for  each  subject  group  in  the  Intermediate  Lexicon.  Tables 
in  Appendix  A  contain  the  terms  in  their  designated  groups  1-25. 


Table  1.  The  ntunber  of  terms  ahd  percentage  of  total 
terms  deleted  frord  further  processing  by  assignment  to 
Lexicon  categories  apd  annexes  or  rejection. 


OTHER  SECTIONS  OF  LEXICON 
Categories 

NUMBER  OP  TERMS 

%  OF  TOTAL  TERM 

B 

119 

3.96 

C 

17 

IT) 

• 

D 

28 

.93 

E 

3 

.09 

Annex  I 

4 

.13 

II 

59 

1.97 

III 

83 

2.76 

IV 

3 

.09 

V 

28 

.93 

Rejected 

314 

10.46 

658 

21.91 
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An  inspection  of  these  tables  in  Appendix  A  reveals  that  the  descrip¬ 
tors  subsumed  under  each  category  do  present  a  realistic  corpus  of  terms 
which  would  prove  useful  in  further  efforts  to  extend  the  Intermediate 
Lexicon.  Semantic  and  generic  problems  are  readily  apparent,  however. 
Synonyms,  near-synonyms  and  equivalents  occur  repeatedly,  particularly  in 
the  more  detailed  categories.  In  addition  numerous  levels  of  generic  and 
specific  terms  occur,  without  all  levels  being  delinated.  For  example, 
in  Table  XVI,  Classification  and  Indexing,  a  broad  generic  term  (T)  occurs, 
i.e.  Classification;  a  generic  intension  of  this  term  (T')  also  occurs, 
i.e.  Classification  systems;  further  intensions  or  specific  terms  (T") 
also  occur,  e.g.  Bliss  classification,  Brussels  classification.  Colon 
classification,  Decimal  classification,  etc.  It  is  unrealistic  to  expect 
that  any  group  of  all  T"  are  fully  enumerated,  given  that  the  descriptor 
lists  reflect  only  the  environments  for  which  they  were  developed.  It  is 
also  apparent  that  not  all  possible  T,  T* ,  or  T"'s  occur;  a  T",  a  specific 
level  descriptor,  may  occur  in  one  authority  list,  an  appropriate  T 
(broad  generic  term)  may  occur  in  the  same  and  other  authority  lists,  but 
no  list  may  present  an  appropriate  T'  (a  more  circuamscribed  generic  term) . 
Thus,  the  base  for  extending  the  Lexicon  is  not  wholly  exhaustive  and 
requires  not  only  additional  inputs  from  related  indexing  schemes  but  also 
requires  intervention  representing  the  consensus  of  qualified  experts. 

As  predicted,  additional  semantic  problems  occmr  in  relation  to  the 
generic  problem.  Terms  which  might  be  viewed  a  "t' *  (an  intension  of 
specific  level  terms)  occur,  in  some  lists.  For  example,  Dewey  Decimal 
Classification  in  Table  XVI  is,  in  one  respect,  an  intension  or  more 
specific  term  of  Decimal  Classification  (T").  An  alternate  viewpoint 
might  consider  the  term  Dewey  Decimal  Classification  to  be,  in  reality  a 
more  desirable  synonym,  and  it  is  thus  designated  as  a  postable  term. 

A  meta-language  intended  to  effect  convertibility  between  systems  cannot 
arbitrate  this  type  of  generic/semantic  problem  but  must  accomodate  both 
viewpoints.  However,  as  the  descriptors  displayed  in  the  25  subject 
groups  cannot  be  assumed  to  be  wholly  exhaustive,  neither  can  it  be 
assumed  that  all  viewpoints  demonstrating  synonomy  are  represented  in 
the  descriptor  lists  reviewed.  Thus,  although  this  research  has  produced 
a  subst.intial  base  from  which  the  Lexicon  may  be  extended,  it  is  clear 
that  the  Intermediate  Lexicon  must  continue  to  acconnodate  inputs  of 
synonyms  reflecting  orgeuiizational  viewpoint  at  futare  stages  of  develop¬ 
ment  and  must  extend  the  techniques  for  utilizing  tliaese  terms  in  system- 
to-system  conversion  efforts. 

Universality  and  Autonomy 

The  universality  and  autou.  the  subject  categories  defined  in 

the  Intermediate  Lexicon  may  be  measured  to  an  extent  by  the  ability  of 
the  Lexicon  to  accommodate  descriptors  from  the  various  authority  files 
and  by  the  degree  in  which  terms  can  unequivocally  be  assigned  to  the 
appropriate  subject  groups.  Table  2  indicates  the  number  of  descriptors 
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Table  2.  The  number  of  terms  assigned  to  the 
subject  groups  of  the  intermediate  Lexicon. 

NUMBER  OF  TERMS 

SUBJECT  GROUP  NUMBER  OF  TERMS  %  OF  TOTAL  TERMS  DUPLICATED 


1 

17 

.57 

8 

2 

106 

3.53 

7 

3 

40 

1.33 

14 

4 

218 

7.26 

45 

5 

68 

2.27 

3 

6 

45 

1.50 

23 

7 

100 

3.33 

23 

8 

269 

8.96 

24 

9 

63 

2.10 

7 

10 

143 

4.76 

5 

11 

39 

1.30 

9 

12 

45 

1.50 

3 

13 

88 

2.93 

59 

14 

32 

1.07 

11 

15 

13 

.43 

2 

16 

297 

9.89 

98 

17 

29 

.97 

2 

18 

22 

.73 

6 

19 

84 

2.80 

3 

20 

163 

5.43 

3 

21 

112 

3.73 

4 

22 

47 

1.57 

20 

23 

240 

7.99 

25 

24 

55 

1.83 

4 

25 

9 

.30 

_ 3 

TOTAL 

2,344 

78.08 

411 
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Table  3,  Largest  subject  groups 
reinked  by  number  of  terms. 


SUBJECT  GROUP 
NUMBER 

SUBJECT  GROUP  NAME 

NUMBER  OF 
TERMS 

%  OF  TOTAL 

TERMS 

DUPLICATED 

16 

Classification  and  Indexing 

297 

9.89 

98 

8 

Documentary  Materials 

269 

8.96 

24 

23 

Equipment  and  Software 

240 

7.99 

25 

4 

Fringe  Techniques  and 

Science 

218 

7.26 

45 

20 

Input  and  Storage 

163 

5.43 

3 

10 

STINFO  Processing: 

General 

143 

4.76 

5 

21 

File  Exploitation 

112 

3.73 

4 

2 

Personnel 

106 

3.53 

7 

7 

Organizations :  Functioning 

100 

3.33 

23 

TOTALS 

1648 

54.88 

234 

21 


which  were  assigned  to  the  25  Subject  groups  delineated  in  the  Inter¬ 
mediate  Lexicon,  the  percentage  of  total  terms  ascribed  to  each  Subject 
group,  and  the  number  of  terms  which  might  appear  in  the  given  Subject 
group  but  are  also  ascribed  to  2uiother  Subject  group  or  groups. 

The  promise  that  the  property  of  xir.iversality  is  properly  ascribed 
to  the  Intermediate  Lexicon  is  supported  by  the  evidence  that  78.08 
percent  of  the  total  descriptors  were  attributed  to  the  Subject  groups 
and  11.45  percent  of  the  total  terms  were  assigned  to  other  categories 
or  annexes  of  the  Lexicon.  (See  Table  1).  It  is  significcUit  that 
approximately  55  percent  of  the  terms  occur  in  nine  of  the  25  Subject 
groups.  Table  3  presents  these  groups  in  descending  order  of  the 
number  of  descriptors  ascribed  to  the  groups.  These  Subject  groups 
appear  to  b  those  which  are  central  to  the  discipline  eund  which  exact 
greater  attention  in  both  the  literature  and  the  application  of  the 
discipline.  It  should  be  noted  that  the  four  Subject  groups  which  had 
the  higher  number  of  terms  were  among  those  groups  which  had  the  greater 
number  of  duplicated  terms. 

Ihe  demonstration  of  the  quality  of  autonomy  is  affected  by  the 
following  factors:  the  relative  fluidity  of  meaning  in  the  language  of 
discourse  of  information  science/documentation  due  to  the  multi  dis¬ 
ciplinary  origins  of  the  terms;  the  method  of  extracting  the  terms  from 
tlie  various  lists  to  an  extent  stripped  the  terms  of  certain  semantic 
precision  which  was  either  implicit  or  explicit  with  the  arrangement  of 
the  term  in  the  structure  of  the  indexing  language;  and  finally  the 
nature  of  the  field  of  information  science/documentation  requires  that 
Lexicon  allow  for  terms  representing  both  things  and  processes  to  he 
entered  in  multiple  subject  groups  despite  the. precision  of  the  defini¬ 
tions  presented  for  each  subject  group.  This  latter  factor  markedly 
affects  a  limited  number  of  groups.  For  example  Subject  group  16, 
Classification  and  Indexing  presents  key-word-in-context  indexing  as 
an  indexing  technique  although  the  definition  for  Subject  group  17, 
Extraction  clearly  states  that  this  indexing  product  belongs  in  this 
category.  The  duplication  of  entries  of  411  terms  indicate  that  this 
problem  was  encountered  with  sufficient  frequency  to  question  the 
autonomy  of  the  subject  groups.  In  some  cases  tlie  decision  to  enter 
a  term  in  multiple  subject  groups  was  arbitrary,  exceeding  the  constraints 
imposed  by  the  definitions  for  each  group,  but  deemed  meritorious  in 
serving  to  extend  the  specific  concepts  of  a  given  group.  As  this 
problem  was  encountered  in  a  limited  number  of  groups,  it  is  expected 
that  need  for  duplication  of  terms  will  not  necessarily  challenge  the 
definitive  exclusiveness  of  the  subject  groups.  Rather,  the  findings 
warrant  further  investigation  into  extending  the  inclusiveness  of 
subject  groups  and  developing  procedures  for  processing  multiple  entry 
concepts  in  effecting  system-to-system  convertibility. 
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Developing  the  Explicit  Hierarchy 


The  approach  to  conversion  from  lists  Ij^  to  1„  into  1q  while  simul¬ 
taneously  constructing  1q  i.e.  making  the  hierarchy  of  the  Intermediate 
Lexicon  explicit  has  been  demonstrated  by  a  selection  of  significant 
terms  from  Group  XVI,  Classification  and  Indexing.  This  group  was  chosen 
because  it  is  central  to  the  discipline  as  evidenced  by  the  number  of 
terms  assigned  emd  should  contain  few  if  any  concepts  that  are  not 
essential  to  scientific  euid  techr  ical  information. 

In  Appendix  B,  exan^les  are  presented  of  the  concordance  table  as 
it  may  be  used  to  develop  the  explicit  hierarchy  of  the  subject  group. 

In  these  ex2ua[ples,  which  ^Lre  not  exhaustive  of  the  subject  groups  itself, 
enpheisis  is  placed  on  classification  and  indexing.  Where  possible  T' , 
the  broad  generic  concept  whici,  makes  up  a  si±>set  of  the  subject  group, 

^md  its  equivalences  are  indicated  as  the  first  term  (terms)  in  the 
content  of  the  table.  In  those  cases  where  T'  was  not  presented  in  a 
list,  it  was  provided  to  assist  in  extending  the  Lexicon. 

Ea<di  of  these  terms,  T*  represents  the  number  of  discrete  concepts 
which  1q  must  accommodate,  a  relatively  small  number.  The  frequency  of 
their  appearance  in  the  resource  lists  shows  that  these  concepts  are 
quite  uniform  in  classification,  but  more  diverse  in  indexing.  This  is 
not  unexpected;  con^jrehensive  classification  has  been  formalized  by 
convention  and  is  represented  by  a  limited  number  of  well-established 
systems.  Specialized  classification  schemes  are  not  common  and  there 
do  not  appear  to  be  any  terms  in  the  Group  that  reflect  only  local  use. 

It  is  also  clear  that  the  generic  concept  classification  can  be 
divided  into  systems,  elements  and  methods.  Each  of  these  in  turn  becomes 
generic  to  a  set  of  specific  terms  which  are  largely  enumerative.  Thus 
we  have: 


Classification  and  Indexing  (Group  XVI) 

T  Classification 
T'  Classifying 

T*  Classification  Systems 

T*  Bliss  Classification 

T"  Brussels  Classification 

T*  Colon  Classification 

T"  Dewey  Decimal  Classification 

T"  Library  of  Congress  Classification 

T"  Universal  Decimal  Classification 

Hie  generic  concept  of  indexing  provides,  on  the  basis  of  the  terms 
available  in  the  lexical  resources,  essentially  the  Sai>«e  division  as  the 
concept  of  classification,  i.e.  types  of  indexes,  systems,  elements. 
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techniques.  The  diversity  in  the  generic  concept  indexing  is  pronounced 
when  the  number  of  detailed  terms ,  T" ,  are  observed  under  indexing  tech¬ 
niques.  Techniques  eire  not  as  firmly  establishing  for  indexing  as  they 
are  for  classification.  The  nuiriber  of  T"  entries  is  three  times  as  great 
as  for  classification  techniques  emd  a  number  of  equivalent  terms  are 
encountered.  Conversely  the  T"  entries  under  Classification  systems  are 
more  extensive  than  the  T"  entries  under  Indexing  systems. 

The  obvious  limitations  in  these  examples  are  due  to  the  viewpoint 
or  bias  of  the  lexical  resources  or  authority  files  from  which  the  terms 
were  selected.  Even  when  presenting  with  limited  examples  of  the 
hierarchy  made  explicit  through  the  application  of  the  concordance,  it 
is  necessary  to  superimpose  higher  level  terms  of  the  T'  level  in  order 
to  obtain  the  balance  requisite  in  an  hierarchical  display.  It  is  not, 
however,  the  purpose  of  the  Intermediate  Lexicon  to  superimpose  an 
arbitrary  structure  with  which  all  other  authority  files  used  in  effecting 
sys tern- to-sy stem  convertibility  must  demonstrate  a  degree  of  conformity. 

In  accepting  this  constraint  in  extending  the  Lexicon  so  that  it  possesses 
not  only  intension  but  structure,  the  most  obvious  solution  is  to  propose 
(or  proport)  that  the  "missing  links"  will  be  supplied  through  the 
inclusion  of  more  lists.  This  is,  at  best,  a  partial  answer  —  adequate 
perhaps  for  approaching  the  problem  of  developing  "total  enumeration". 

This  proposed  approach,  as  a  research  oriented  approach,  is  doomed  to 
only  partial  success  since  it  assumes  optimization  of  not  only  the  Inter¬ 
mediate  Lexicon  but  also  the  source  lists.  This  ignores  the  constraints 
imposed  on  the  Intermediate  Lexicon  as  it  actually  functions  in  a 
realistic  environment  in  effecting  system-to-system  conversion. 

The  conveniently  available  generic  concepts  in  this  section  suggest 
that  a  well-ordered  classification  could  be  constructed  from  a  collection 
of  resource  lists  and  produce  a  fair  representation  of  the  entire  dis¬ 
cipline.  This  would  be  true  enough  if  the  collection  of  resource  lists 
were  expanded  to  include  those  which  scientific  and  technical  information 
has  cannibalized.  There  are  very  few  autonomous  concepts  that  exist  only 
or  originally  within  the  discipline.  The  notion  of  information  retrieval 
and  its  related  terms  may  be  all  that  remains  when  the  rest  are  traced  to 
the  contributing  discipline.  The  origin  of  these  concepts  is  not  presently 
of  more  than  academic  interest.  They  are  the  basis  for  the  language  of 
discourse  in  this  domain  and  it  is  the  assembled  lexical  resources  that 
form  the  object  of  study. 

It  is  Innediately  evident  that  the  conversion  will  operate  with 
advantage  only  in  two  respects:  the  onc-for-one  match  and  the  transfer 
of  specific  to  the  proximitc  general  level  as  shown  in  the  example  of 
Figure  1.  Here  list  6  inputs  Dewey  Decimal  Classification  and  Library 
of  Congress  Classification,  both  specified  in  the  Lexicon.  List  4  can 
accept  these  terms  as  outputs  only  under  Classification.  This  is 
appropriate,  although  not  without  difficulties,  since  neither  Dewey  nor 
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Figure  1.  An  example  of  conversion  paths 
between  two  indexing  system  by  way  of  the 
intermediate  Lexicon. 
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Librax’/  of  Congress  has  autonomous  entry  in  List  4.  In  effect  List  4 
has  a  T,  (Classification)  and  T",  T"  (Dewey,  LC)  without  cin  appropriate 
T'  (Classification  systems).  This  has  marked  implications  for  List  8's 
ability  to  receive  inputs  from  List  4.  List  8,  with  the  appropriate 
T'  (Classification  systems)  and  the  specified  T",  T"  (Dewey,  LC)  cannot 
received  related  input  at  the  T"  level,  specifically  Blidd  and  Brussels 
Classifications  because  these  T”,  T"  are  subsumed  under  T'  (Classifica¬ 
tion  Systems)  in  List  8.  Thus  List  8  cannot  receive  from  List  4  below 
the  level  of  T  (Class! 'ication) .  In  short,  except  where  a  one-to-one 
ratio  exists,  when  a  T'  is  present  in  one  list  eind  not  in  emother  system 
to  system  convertibility  cannot  be  adequately  effected. 

The  underlying  assumption  here  is  that  each  list  has  and  desires 
access  to  the  Intermediate  Lexicon  only  so  that,  in  effect,  potentially 
massive  amounts  of  processing  are  performed  according  to  the  decision 
making  algorithm,  the  Intermediate  Lexicon,  in  order  to  achieve  convert¬ 
ibility  between  the  document  stores  on  a  cost/ef fective  basis.  The 
decision  making  functions  Cein  be  amplified  because  each  system  clearly 
has  an  option  of  rejecting  specific  terms  not  used  in  its  vocabulary  or 
of  converting  by  an  equivalent  to  the  proximate  generic  level.  Electing 
to  receive  at  the  proximate  generic  level  has  the  obvious  disadvantage 
of  introducing  unspecified  information  into  the  receiving  system  when 
the  receiving  system  desires  specificity  of  the  subsumed  concepts. 

We  are  left  with  the  question:  What  will  be  the  consequences  in 
real  applications?  The  problem  seems  clear  in  a  research  situation  but 
the  probability  of  this  situation  occurring  with  troublesome  frequency 
could  be  minimal  owing  to  the  viewpoint  or  bias  of  the  respective  lists. 
It  appears  that  specific  T’"s,  which  in  a  given  list  lack  a  proximate 
generic  level  T' ,  reflect  a  parochial  interest  of  that  user  community. 

In  that  event,  specific  T^'s  should  not  be  converted  to  the  supposedly 
proximate  generic  level  without  the  express  requirement  to  do  so.  Again, 
the  liklihood  of  organizations  with  divergent  viewpoints  requiring  total 
convertibility  between  document  stores  is  open  to  question.  The  require¬ 
ment  is  more  likely  to  be  for  conversion  of  segments  of  a  list,  a 
requirement  in  which  special  conditions  may  be  set  for  term  inclusion 
and  exclusion.  This  aspect  does,  however,  violate  the  assumption  that  in 
the  conversion  process,  a  given  list  has  no  requirement  to  be  concerned 
with  the  viewpoint  of  otJier  lists.  It  seems  prudent  to  suggest  that  any 
system  receiving  exchange  through  the  Intermediate  Lexicon  should  do  so 
only  with  the  full  awareness  of  the  in^ct  that  other  system  orientations 
will  have  on  their  own  outputs. 
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